{
 "cells": [
  {
   "cell_type": "raw",
   "id": "cb6f552e-775f-4d84-bc7c-dca94c06a33c",
   "metadata": {},
   "source": [
    "---\n",
    "title: Tagging\n",
    "sidebar_class_name: hidden\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0507a4b",
   "metadata": {},
   "source": [
    "# Classify Text into Labels\n",
    "\n",
    "Tagging means labeling a document with classes such as:\n",
    "\n",
    "- sentiment\n",
    "- language\n",
    "- style (formal, informal etc.)\n",
    "- covered topics\n",
    "- political tendency\n",
    "\n",
    "![Image description](../../static/img/tagging.png)\n",
    "\n",
    "## Overview\n",
    "\n",
    "Tagging has a few components:\n",
    "\n",
    "* `function`: Like [extraction](/docs/tutorials/extraction), tagging uses [functions](https://openai.com/blog/function-calling-and-other-api-updates) to specify how the model should tag a document\n",
    "* `schema`: defines how we want to tag the document\n",
    "\n",
    "## Quickstart\n",
    "\n",
    "Let's see a very straightforward example of how we can use OpenAI tool calling for tagging in LangChain. We'll use the `.withStructuredOutput()` method supported by OpenAI models:\n",
    "\n",
    "\n",
    "```{=mdx}\n",
    "import Npm2Yarn from \"@theme/Npm2Yarn\"\n",
    "\n",
    "<Npm2Yarn>\n",
    "  langchain @langchain/openai @langchain/core zod\n",
    "</Npm2Yarn>\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8ca3f93",
   "metadata": {},
   "source": [
    "Let's specify a [Zod](https://zod.dev) schema with a few properties and their expected type in our schema."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "39f3ce3e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import { ChatPromptTemplate } from \"@langchain/core/prompts\";\n",
    "import { ChatOpenAI } from \"@langchain/openai\";\n",
    "import { z } from \"zod\";\n",
    "\n",
    "const taggingPrompt = ChatPromptTemplate.fromTemplate(\n",
    "    `Extract the desired information from the following passage.\n",
    "\n",
    "Only extract the properties mentioned in the 'Classification' function.\n",
    "\n",
    "Passage:\n",
    "{input}\n",
    "`\n",
    ");\n",
    "\n",
    "const classificationSchema = z.object({\n",
    "    sentiment: z.string().describe(\"The sentiment of the text\"),\n",
    "    aggressiveness: z.number().int().min(1).max(10).describe(\n",
    "        \"How aggressive the text is on a scale from 1 to 10\"\n",
    "    ),\n",
    "    language: z.string().describe(\"The language the text is written in\"),\n",
    "});\n",
    "\n",
    "// LLM\n",
    "const llm = new ChatOpenAI({\n",
    "    temperature: 0,\n",
    "    model: \"gpt-3.5-turbo-0125\",\n",
    "});\n",
    "// Name is optional, but gives the models more clues as to what your schema represents\n",
    "const llmWihStructuredOutput = llm.withStructuredOutput(classificationSchema, { name: \"extractor\" })\n",
    "\n",
    "const taggingChain = taggingPrompt.pipe(llmWihStructuredOutput);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "5509b6a6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{ sentiment: \u001b[32m\"positive\"\u001b[39m, aggressiveness: \u001b[33m1\u001b[39m, language: \u001b[32m\"Spanish\"\u001b[39m }"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "const input = \"Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!\"\n",
    "await taggingChain.invoke({ input })"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d921bb53",
   "metadata": {},
   "source": [
    "As we can see in the example, it correctly interprets what we want.\n",
    "\n",
    "The results vary so that we may get, for example, sentiments in different languages ('positive', 'enojado' etc.).\n",
    "\n",
    "We will see how to control these results in the next section."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bebb2f83",
   "metadata": {},
   "source": [
    "## Finer control\n",
    "\n",
    "Careful schema definition gives us more control over the model's output. \n",
    "\n",
    "Specifically, we can define:\n",
    "\n",
    "- possible values for each property\n",
    "- description to make sure that the model understands the property\n",
    "- required properties to be returned"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69ef0b9a",
   "metadata": {},
   "source": [
    "Let's redeclare our Zod schema to control for each of the previously mentioned aspects using enums:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "6a5f7961",
   "metadata": {},
   "outputs": [],
   "source": [
    "import { z } from \"zod\";\n",
    "\n",
    "const classificationSchema = z.object({\n",
    "    sentiment: z.enum([\"happy\", \"neutral\", \"sad\"]).describe(\"The sentiment of the text\"),\n",
    "    aggressiveness: z.number().int().min(1).max(5).describe(\n",
    "        \"describes how aggressive the statement is, the higher the number the more aggressive\"\n",
    "    ),\n",
    "    language: z.enum([\"spanish\", \"english\", \"french\", \"german\", \"italian\"]).describe(\"The language the text is written in\"),\n",
    "});"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "e5a5881f",
   "metadata": {},
   "outputs": [],
   "source": [
    "const taggingPrompt = ChatPromptTemplate.fromTemplate(\n",
    "`Extract the desired information from the following passage.\n",
    "\n",
    "Only extract the properties mentioned in the 'Classification' function.\n",
    "\n",
    "Passage:\n",
    "{input}\n",
    "`\n",
    ")\n",
    "\n",
    "// LLM\n",
    "const llm = new ChatOpenAI({\n",
    "    temperature: 0,\n",
    "    model: \"gpt-3.5-turbo-0125\",\n",
    "});\n",
    "const llmWihStructuredOutput = llm.withStructuredOutput(classificationSchema, { name: \"extractor\" })\n",
    "\n",
    "const chain = taggingPrompt.pipe(llmWihStructuredOutput);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ded2332",
   "metadata": {},
   "source": [
    "Now the answers will be restricted in a way we expect!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "d9b9d53d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{ sentiment: \u001b[32m\"happy\"\u001b[39m, aggressiveness: \u001b[33m3\u001b[39m, language: \u001b[32m\"spanish\"\u001b[39m }"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "const input = \"Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!\"\n",
    "await chain.invoke({ input })"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "1c12fa00",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{ sentiment: \u001b[32m\"sad\"\u001b[39m, aggressiveness: \u001b[33m5\u001b[39m, language: \u001b[32m\"spanish\"\u001b[39m }"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "const input = \"Estoy muy enojado con vos! Te voy a dar tu merecido!\"\n",
    "await chain.invoke({ input })"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "0bdfcb05",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{ sentiment: \u001b[32m\"neutral\"\u001b[39m, aggressiveness: \u001b[33m3\u001b[39m, language: \u001b[32m\"english\"\u001b[39m }"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "const input = \"Weather is ok here, I can go outside without much more than a coat\"\n",
    "await chain.invoke({ input })"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf6b7389",
   "metadata": {},
   "source": [
    "The [LangSmith trace](https://smith.langchain.com/public/455f5404-8784-49ce-8851-0619b99e936f/r) lets us peek under the hood:\n",
    "\n",
    "![](../../static/img/classification_ls_trace.png)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Deno",
   "language": "typescript",
   "name": "deno"
  },
  "language_info": {
   "file_extension": ".ts",
   "mimetype": "text/x.typescript",
   "name": "typescript",
   "nb_converter": "script",
   "pygments_lexer": "typescript",
   "version": "5.3.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
